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A method based on multicanonical Monte Carlo is applied to the calculation of large deviations 
in the largest eigenvalue of random matrices. The method is successfully tested with the Gaussian 
orthogonal ensemble (GOE), sparse random matrices, and matrices whose components are subject 
to uniform density. Specifically, the probability that all eigenvalues of a matrix are negative is 
estimated in these cases down to the values of ~ 10 -200 , a region where naive random sampling 
is ineffective. The method can be applied to any ensemble of matrices and used for sampling rare 
events characterized by any statistics. 
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I. INTRODUCTION 

Rare events caused by rare realization of impurities of- 
ten govern the properties of random systems and play an 
essential role in their study. Numerical computation of 
the probabilities of rare events is, however, computation- 
ally expensive. When the probability takes very small 
values, say, 10~ 15 or less, it is virtually impossible to 
calculate the correct probability value by naive random 
sampling. 

Recently, approaches based on dynamic Monte Carlo 
(Markov chain Monte Carlo) [3, 0] have been shown to 
be useful for sampling rare events and calculating large 
deviations in the corresponding statistics. The novelty of 
the approach is that a dynamic Monte Carlo algorithm 
is used for calculating sample averages over configura- 
tions of impurities, instead of computing thermal aver- 
ages. Successful examples in physics include applications 
in spin glass [!, H| , diluted magnets [j| , and directed ran- 
dom walks in random media Q. Some references 0-Q3 
also discuss applications in information processing and 
other engineering problems. 

The aim of this paper is to apply the method to sample 
rare events in random matrices. Random matrices have 
been a classical subject with a number of applications in 
physics and other fields [TlTjl3j . Specifically, large devia- 
tions in the maximum eigenvalue of random matrices is a 
subject of recent interest in various fields such as ecology 
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14 1 , cosmology [15| , mathematical statistics [16[ , and in- 
formation compression [13]. The tail of the distribution 
of the maximum eigenvalue is important because it gives 
the probability of all eigenvalues being negative, which 
is often related to the stability condition of complicated 
systems [lj, ll5j . 

A well-known study by Tracy and Widom established 
the celebrated "1/6 law" [l8[ on small deviations in the 
maximum eigenvalue of random matrices. On the other 
hand, analytical studies [l9l - [2TI | of large deviations give 
estimations of the tails of probabilities in special cases 
such as the Gaussian orthogonal ensemble (GOE) and en- 
semble of random Wishart matrices. However, the tech- 
niques based on the Coulomb gas representation are dif- 
ficult to generalize to ensembles with other distributions 
of the components. Other results by mathematicians and 
physicists are also limited to special ensembles and/or 
give only the upper bound of the probabilities [T3]. Thus, 
an efficient numerical approach that enables exploration 
of extreme tails of density is necessary. 

We pro pose a method based on multicanonical Monte 
Carlo |22l f23j ] as a promising approach to the problem. 
As we will show in this study, quantitative results are ob- 
tained in examples of sparse random matrices and matri- 
ces whose components are subject to the uniform density. 
A similar method is used in [10( to calculate large devi- 
ations in the growth ratio of matrices. The paper (icj . 
however, focuses on applications in numerical analysis 
and does not compute large deviations in the largest 
eigenvalues. 

The organization of this paper is as follows: In Sec. [Til 
we summarize the multicanonical Monte Carlo algorithm. 
In Sec lIIIi we discuss how multicanonical Monte Carlo is 
used to calculate large deviations and show the results of 
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numerical experiments on the tails of the distribution of 
the largest eigenvalues. Sec. [IV] covers the computation 
of the probability that all eigenvalues are negative; as 
noted above, this is a typical application of the proposed 
method. In. Sec. [yj sparse matrices are treated. In 
Sec. I VI) concluding remarks are given. 

II. MULTICANONICAL MONTE CARLO 

Let us summarize the idea of multicanonical Monte 
Carlo [H, [23[. Assuming the energy E(x) of a state x, 
our task is to calculate the density D(E) of states defined 
by 

D{E) = J S(E(x) - E) dx, (1) 

where S is the Dirac <5-function, and J ■ ■ ■ dx denotes a 
multiple integral in the space of states x. 

A key quantity of multicanonical Monte Carlo is the 
weight function f(E) of the energy E. Performing dy- 
namic Monte Carlo sampling with the weight f(E(x)), 
we modify f(E) step- by-step until the marginal den- 
sity q(E) of E is almost flat in a prescribed interval 
^min < E < i^max- The initial form of f(E) is arbi- 
trary, and we can start, for example, from a constant 
function. Several methods are proposed for optimizing 
a univariate function f(E), among which a method pro- 
posed by Wang and Landau [H| is most useful and used 
in this paper. After a weight function f*{E) that gives 
a sufficiently flat q(E) is obtained, we compute an ac- 
curate estimate q*(E) of q(E) by a long simulation run 
with the weight f*(E(x)). Then, D(E) is estimated by 
the relation 

D{E) cx (E min <E< E max ). 

This simple algorithm has significant advantages over 
conventional methods for estimation of D(E). First, it 
realizes accurate sampling of tails of D(E) without esti- 
mating densities in the high-dimensional state space of 
x. Second, when we include the region of E with large 
values of D(E) in the interval E m [ n < E < i? max , the 
mixing of dynamic Monte Carlo is dramatically facili- 
tated. This "annealing" effect is the reason that multi- 
canonical Monte Carlo is successfully used to calculate 
thermal averages at low temperatures in the studies of 
spin glass [23| and biomolecules [25j . 

III. LARGE DEVIATIONS IN THE LARGEST 
EIGENVALUES 

An essential observation in the present approach is that 
the energy E in multicanonical Monte Carlo need not be 
an energy in the ordinary sense. That is, we can sub- 
stitute for E any quantity for which we are interested 



in its rare fluctuations or large deviations from the av- 
erage' similar approaches to other problems are found 
in@,l,|M3. 

In this study, we regard the maximum eigenvalue Ai (x) 
of a matrix x as a fictitious "energy" of the state x. Also, 
we can introduce an underlying density p(x) that gives 
the probability of x under random sampling. While p{x) 
is the uniform density in statistical mechanics, p(x) in the 
present case characterizes an ensemble of matrices. Here- 
after, we assume the factorization p(x) = Yiij Pij( x ij)- 
The normalized density D(X 1 ) of the states is written as 

D(Ai) - J S(X 1 (x)~X 1 )p(x)dx, (2) 

where we replace E(x) in (TTJ with Ai(a;). D(X\) is simply 
the probability distribution of Ai, whose extreme tails we 
are interested in. 

Now the application of multicanonical Monte Carlo 
is straightforward. We employ a Metropolis-Hastings 
algorithm to generate samples according to the weight 
f (Xi(x))p(x) . A single component of the random ma- 
trix x is chosen and changed at each step; in ensembles of 
symmetric matrices, Xji should also be changed if i j, 
which is necessary to keep the symmetry of the matrix. 
The candidate x™f w of is generated according to the 
proposal density r<j \x^ w \x old ), where x old is the current 
value of x; x"f w is accepted if and only if the Metropolis 
ratio 

PvK ew ) nA* old K ew ) f{Xi(x new )) 

Pijixff) n 3 (x™ w \x° ld ) /(A!(x° M )) 

is smaller than a random number uniformly distributed 
in (0,1]. Repeating this procedure, the function /(Ai) 
is tuned by the method of Wang-Landau (24[. Once a 
weight function f*(X\) that gives a sufficiently flat q{X\) 
is obtained, we estimate D(X±) using the formula 

D(Xx) cx (Af 1 < A X < AT X ), 

where q*(Xi) is the density of Ai estimated by a long run 
with the fixed weight function /*(Ai). 

A simple choice of the proposal density is 
Tij (x™ } ew \x old ) = pij(x™ ew ), which results in a sim- 
ple form of the Metropolis ratio 

ff _ /(Ai (»""")) 
f{Xi(x old )) ■ 

However, this choice may not be adequate in some cases 
where the support of the densities pij(xij) is not finite, 
because very large deviations in an element Xij Cclll be 
relevant for large deviation of the largest eigenvalue. An 
alternative choice is to use r i3 ■ l {x^ w \x old ) = f i j{x^ J ew — 
x°j d ), where is an even function; hereafter we will 

call an algorithm using this proposal density as a random 
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FIG. 1. Density D(Xi) in GOE. Results of the proposed 
method and a naive random sampling method are compared 
for N = 64 and 128. Both almost overlap with N 1 ^' scaling. 
The symbols + and x appear only in the region where naive 
random sampling gives meaningful results. 



walk scheme. In this case, the Metropolis ratio is given 
by 

PnWr )/(Ai(^ e ")) 
P^lf) f(^ old )) ' 

We test the proposed method with the Gaussian or- 
thogonal ensemble (GOE); GOE is an ensemble of real 
symmetric matrices whose entries are independent Gaus- 
sian variables [Hj]. The Householder method is used to 
diagonalize the matrix in each step; it is also used in other 
examples in this paper. We employ two different forms 
of the proposal density: (1) r ij {x^ w \x old ) = p %j {x™ w ) 
and (2) r ij {x^ w \x old ) = Pij{x^ w - xf/); the latter 
is a special case of the random walk scheme, where 
^ij(') — Pij(') I^H- No significant difference between (1) 
and (2) is seen in our study. 

In Fig.[TJ a result estimated with the proposed method 
with nj(x™f w \x old ) = Pij{xff w ) is compared with the 
corresponding result of naive random sampling. The to- 
tal number of matrix diagonalizations is 4.5 x 10 8 for 
N = 64 and 2.25 x 10 8 for N = 128; they arc the 
same in both of the proposed method and naive ran- 
dom sampling. In the proposed method, two third of 
them are used to optimize the weight, while the rest is 
used to calculate the estimates. We confirmed that mod- 
ification factors in the Wang-Landau method are suffi- 
ciently close to unity at the end of the weight optimiza- 
tion. For N = 64, we also apply the random walk scheme 
nj(xff w \x old ) = Pij(xff w -xff) and obtain the same re- 
sult, but the computational time increases. The results 
in Fig. Q] show that the proposed method enables us to 
estimate the tails of the density down to ~ 10~ 15 , which 
is scarcely sampled by the naive method. 



IV. THE PROBABILITY THAT ALL 
EIGENVALUES ARE NEGATIVE 

The proposed strategy also allows us to calculate the 
probability P(Vi, A; < 0) that all eigenvalues of a random 
matrix are negative, which is important in applications in 
a variety of fields [lj, EB|- Using the relation Vz, A^ < Ai, 
this probability is calculated by 

P(Vi,\i < 0) = / D(Ai)dAi 

Here we assume that the density -D(Ai) of the maximum 
eigenvalue is estimated in an interval [A™ 111 , A™ 51 *] by the 
proposed method. The probabilities P(Ai < A™ 111 ) and 
P(A I ] nax < Ai) are also assumed to be negligibly smaller 
than P(Xf n < Ai < 0) and P(Af in < Ai < X?**), 
respectively. 

The probability that all eigenvalues are positive can 
also calculated with a similar way; it coincides with the 
probability that all eigenvalues are negative when the 
distribution of components is symmetric with respect to 
the origin. 

First, we test the proposed method with GOE, where 
the asymptotic behavior of PNLXi < 0) for large N is 
given by Dean and Majumdar [ijj [2(| as 

P(Vi, Xi < 0) - exp (-aN 2 ) . 

where a = ^ = 0.274653 • • • . This expression is de- 
rived by interpreting the eigenvalues as a Coulomb gas, 
a method that obviously does not apply general distribu- 
tion of components of matrices. 

Confirming the result by numerical methods is difficult 
because we should sample very rare events to estimate 
the tails of the distribution. Dean and Majumdar [l9|, 
[2pj (and Aazami and Easther [l^]) provided numerical 
results by naive random sampling, but their results are 
limited to small N, such as TV = 7 in (N = 8 with 

an additional assumption [201). Dean and Majumdar also 
did numerical computation up to N — 35 based on the 
Coulomb gas representation; their computation does not, 
however, provide an independent check to the theory and 
cannot be generalized to an arbitrary ensemble. 

Fig. [2] shows our numerical results for GOE. We can 
treat matrices up to JV = 40, which is not treated by 
naive random sampling. Here we use the random walk 
scheme r tJ (xff w \x old ) = Pij(x?f w - xf d ) and the total 
number of matrix diagonalizations is 3 x 10 9 for N = 4 ~ 
20 and 4.5 x 10 9 for N = 22 - 40. The results for N < 7 
coincide with those by naive random sampling. They 
are also consistent with the fit -0.272iV 2 - 0.4937V + 
0.244 of the numerical results calculated in [2(| with the 
Coulomb gas representation. Hence, probabilities as tiny 
as ~ 10 -200 are estimated by the proposed method and 
agree well with the known results. 

Next, to show the flexibility of the proposed method, 
we calculate the probability P(Vi, Ai < 0) for an ensem- 
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FIG. 2. Probability P(Vi, A, < 0) for GOE versus size TV 
of the matrices. The results of the proposed method (+) 
and the naive random sampling method (0) are shown. The 
results of naive random sampling are available only in the 
region TV < 7. The curve indicates a quadratic fit to the 
results with the Coulomb gas representation given in Dean 
and Majumdar[l|. 



ble of real symmetric matrices whose entries are indepen- 
dently distributed with the uniform distribution Pij(xij) 
denned by 

\xij\ < %/3L and i = j, 

vfe, M < yJ\L and i^j, 
else. 

Hereafter, the value of the parameter L is unity, which 
fits the variances of the components to those of the GOE. 
Results of the proposed method for this ensemble are 
shown in Fig. [3] The proposal density rij(x^ w \x old ) — 
Pij( x ?f w ) i s used. Total number of matrix diagonaliza- 
tions is 4.5 x 10 9 for each value of iV; two third of which 
are used to optimize the weight. Fitting the results yields 
asymptotic behavior of the probability, 

P(Vi, Xi < 0) - exp (-aTV 2 -bN-c) 

for large TV, where a = 0.679, — b = 4.76, and c = 17.31. 
As shown in Fig. [31 these probabilities significantly differ 
from that for the GOE with the same variance. 



V. SPARSE RANDOM MATRICES 

We also study ensembles of sparse random matrices. 
Once the matrices become sparse, the Coulomb gas ap- 
proach is not applicable even in Gaussian cases. The 
proposed approach allows us to calculate the probability 
P(Vi, Xi < 0) in these cases. In this section, we use the 
proposal density r i3 (x^ w \x old ) = pij(x'2f w ), but some of 
the results are also checked by random walk schemes. 
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FIG. 3. Probability P(Vz, Xi < 0) for an ensemble of matrices 
whose components are uniformly distributed. The horizontal 
axis corresponds to the size TV of the matrices. The results 
of the proposed method (+) and the naive random sampling 
method (0) are shown. The results of naive random sampling 
are shown for 4 < TV < 7. The solid curve indicates the 
probability for the GOE with the same variance. 



Various ways of defining sparse random matrices are 
available. Among them, we consider two types of def- 
initions in this study. The first is as follows: (1) The 
matrix is symmetric. (2) All diagonal entries are — 1. 
(3) Nonzero off-diagonal entries in the upper half of 
the matrix are mutually independent Gaussian variables 
with zero mean and unit variance. (4) Total number of 
nonzero entries is fixed at 7TV, where 7 is the average 
number of nonzero entries per row. (5) The positions 
of nonzero off-diagonal entries in the upper half of the 
matrix are randomly chosen. 

The total number of nonzero components should be 
preserved with this definition. Hence, the single compo- 
nent update in previous sections is replaced by a trial of 
exchanging zero and nonzero components with resam- 
pling of the nonzero component. Other parts of the 
algorithm remain essentially the same. An example of 
the density D(Xi) computed by this modified method is 
shown in Fig. 2) 

The probability P(Vi, Xi < 0) that all eigenvalues are 
negative is also successfully calculated by this algorithm 
for 7 = 3,4, and 5, as shown in Fig. [SJ These results 
indicate that for sparse random matrices, the probability 
P(Xi < 0,Vi) behaves as 

P(Vi, A, < 0) - exp (-a 7 TV) 

for large TV, where the estimated values of the constants 
a 7 are (13 = 0.68, 04 = 1.20, and 05 = 1.81 for 7 — 3,4, 
and 5, respectively. 

In the case of sparse matrices, the log-probability 
logP(Vi,A; < 0) is linear in TV, which is apparently dif- 
ferent from the behavior proportional to TV 2 seen in the 
previous two examples. However, if we plot the proba- 
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FIG. 4. Density -D(Ai) in a case of sparse random matrices. 
The first definition is applied; results of the proposed method 
and the naive random sampling method are compared for TV = 
30 and 7 = 3. The symbol + appears only in the region where 
naive random sampling gives nonzero results. 
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FIG. 5. Probabilities P (Vi, Xi < 0) for an ensemble of sparse 
random matrices estimated by the proposed method. The 
first definition is applied; the results with 7 = 3, 4, and 5 
versus size N of the matrices are shown. The lines show linear 
fits of the data. 



mean and unit variance, respectively; each component is 
assumed to be an independent sample from this distribu- 
tion. 

In this case, all components are mutually independent 
and the modification for keeping the number of nonzero 
components is not necessary. However, since the diago- 
nal elements can vanish, singular behavior of the density 
D(Ai) of states appears at Ai = 0, which affects the effi- 
ciency of the proposed method. 

Fortunately, when we are interested in P(Vi,Ai < 0), 
this difficulty is easily treated; we use the fact that 
the condition that all diagonal elements are negative 
Vi,xa < is a necessary condition for Vi, Aj < 0. By 
using this condition, the following two-stage method is 
introduced. First, we calculate the conditional probabil- 
ity P(Vi,Ai < | Vi, Xa < 0). This conditional probabil- 
ity can be calculated with a multicanonical algorithm, in 
which we reject any state 3i, Xu > 0. The second step 
is to calculate the probability P(Vi, xu < 0). Elementary 
calculation shows that 

/I \ N ( \ N 
P(Vi, Xii <0)=(-) x(-L\ . (3) 

Then, the probability P(Vi, Xi < 0) is given by the prod- 
uct P(Vi, A, < | Vi,xu < 0) x P(Vi, x u < 0). 

Fig. [5] shows examples of the probability 
P(Vz, Xi < 0| Vi, Xu < 0) calculated in the first step; it 
is linear in N in the semi-log scale, as expected. The 
probability P(Vi, < 0) obtained from it is shown in 
Fig. [3 In this case, logP(yi,Xa < 0) is no longer linear 
in N because of an 0(N log N) term arising from |3]). 
They are fitted as 

P(yi,x u < 0) ~ (^) JV exp(-a 7 /V-) : 

where 03 = 0.845, = 1.14, 05 = 1.44, and = 1.75 
for 7 = 3,4, 5, and 6, respectively. 



VI. CONCLUDING REMARKS 



bility P(Vi, Aj < 0) with the number M of nonzero com- 
ponents instead of the size TV, the dependence is linear 
in all examples. Because M oc N in a sparse case and 
M oc N 2 in a dense case, the obtained results are natu- 
rally explained. 

The definition of sparse random matrices most frequent 
in the literature [13] differs from that given above. Here, 
a second definition of sparse random matrices is given by 
assigning the probability 

Pij&ij) = (l - jf) + ^(^j) 

to all components Xij, (i>j), where 8 and it denote 
Dirac's delta function and a Gaussian density with zero 



A method based on multicanonical Monte Carlo is pro- 
posed and applied to the estimation of large deviations in 
the largest eigenvalue of random matrices. The method is 
successfully tested with the Gaussian orthogonal ensem- 
ble (GOE), an ensemble of matrices whose components 
are uniformly distributed in an interval, and an ensem- 
ble of sparse random matrices. The probabilities that all 
eigenvalues of a matrix are negative are successfully es- 
timated in cases where naive random sampling is largely 
ineffective; the smallest values of the obtained probabili- 
ties are ~ 10~ 200 . 

The method can be applied to any ensemble of matri- 
ces. Moreover, it enables sampling of rare events defined 
by any statistics. Hence, it will be interesting to apply 
the method to large deviations in other quantities, such 
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as statistics involving eigenvectors or spacing of eigenval- 



ues. 
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FIG. 6. Probabilities P(Vi, \ t < | Vi, xu < 0) for an en- 
semble of sparse random matrices estimated by the proposed 
method. The second definition is applied; the results with 
7 = 3, 4, 5, and 6 versus size N of the matrices are shown. 
The lines show linear fits to the data. The results of naive 
random sampling are also shown. 
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FIG. 7. Probabilities P(Vi, Xi < 0) for an ensemble of sparse 
random matrices estimated by the proposed method. The 
second definition is applied; the results with 7 = 3, 4, 5, and 
6 versus size N of the matrices are shown. 
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